Add RT acceleration structure abstraction with size queries and resource allocation#1232
Merged
Merged
Conversation
3bfe062 to
fd17d69
Compare
f6ae856 to
7527088
Compare
7527088 to
546235a
Compare
546235a to
3596fcc
Compare
EmilioLaiso
reviewed
Jun 1, 2026
3596fcc to
cfd4d51
Compare
7 tasks
cfd4d51 to
f6d9ff4
Compare
This was referenced Jun 2, 2026
| VkBuffer Buffer; | ||
| VkDeviceMemory Memory; | ||
| VkDeviceAddress DeviceAddress; | ||
| PFN_vkDestroyAccelerationStructureKHR FnDestroyAS; |
Contributor
There was a problem hiding this comment.
Should we group all the acceleration structure extensions functions together in a struct instead?
Collaborator
Author
There was a problem hiding this comment.
The AS extension function pointers are already grouped:
offload-test-suite/lib/API/VK/Device.cpp
Lines 1154 to 1159 in 19ff2e5
The lone PFN_vkDestroyAccelerationStructureKHR held on VulkanAccelerationStructure here is the only one needed for cleanup of an AS handle; not worth dragging the whole struct in for a single function.
7 tasks
f6d9ff4 to
19ff2e5
Compare
bda11a5 to
4e7f57e
Compare
4e7f57e to
824e33c
Compare
bogner
approved these changes
Jun 5, 2026
…rce allocation Introduce the foundational types for ray tracing acceleration structures: abstract `AccelerationStructure` base class, geometry/instance descriptors, BLAS/TLAS build-request structs with size queries, and AS resource allocation across DX12, Vulkan, and Metal. Recording build commands lands in a follow-up commit on top of the ComputeEncoder abstraction. DX12: `ID3D12DeviceX` typedef bumps from `ID3D12Device2` to `ID3D12Device5`, so the existing `Device` member directly exposes `GetRaytracingAccelerationStructurePrebuildInfo` (and the eventual `CreateStateObject` / `SetPipelineState1` for the PSO RT epic) — no separate `Device5` member or post-create `QueryInterface` dance. `D3D12CreateDevice` is already invoked with `IID_PPV_ARGS(&Device)`, so the bump naturally requires the adapter to support the Device5 interface (Win10 1809+); RT-capable hardware is selected by the `acceleration-structure` lit feature regardless. Vulkan device creation switches to a single `vkGetPhysicalDeviceFeatures2` call covering every extension feature struct we care about (atomic-int64, mesh-shader, acceleration-structure, BDA on 1.1): each struct is chained into `pNext` before the query, and post-query we verify the gating bool and clear the sub-features we don't enable (capture-replay, indirect-build, multiview, etc.). Drive-by: rather than letting `vkCreateDevice` reject the device with a generic `VK_ERROR_FEATURE_NOT_PRESENT`, the code now returns a descriptive `llvm::Error` naming the extension and the bool that came back zero — pinpointing the case where a driver advertises an extension but reports its base feature as `VK_FALSE`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
824e33c to
a7cdb63
Compare
MarijnS95
added a commit
that referenced
this pull request
Jun 11, 2026
Closes #1158 🥳 ## Summary Wire up acceleration-structure descriptor binding end-to-end across all three backends so shaders can actually consume the TLAS that `buildPipelineAccelerationStructures()` produced — completing the stack and promoting the three InlineRT tests from XFAIL to passing. Per-resource AS handling lands in a new per-backend `createAS()` (paired with `createSRV()` / `createUAV()` / `createCBV()`): a pure single-create that queries TLAS sizes via `Dev.getTLASBuildSizes()` and allocates the handle via `Dev.createTLAS()`, returning the `unique_ptr` to the caller. No `InvocationState` or `Pipeline` access — the multi-create (`createBuffers()` / `createResources()`) records the handle in `InvocationState::TLASes` (a `StringMap` keyed by `TLASDesc::Name`) and wires a non-owning AS pointer into the per-resource bundle the binding loop reads. The shared AS-build helper picks up that map and walks `P.AccelStructs.TLAS` to pair each YAML descriptor with its pre-allocated handle by name (TLASes without a map entry are skipped, i.e. declared but unbound). BLAS handles are still allocated by the helper itself since BLASes aren't user-bindable. `executeProgram()` in each backend now runs as: - `createBuffers` / `createResources` (`createAS()` allocates TLAS handles) - open encoder → `buildPipelineAccelerationStructures()` → end - **Vulkan**: `createDescriptorPool()` counts AS descriptors in a separate scalar (the KHR enum value `1000150000` doesn't fit in the indexed array used for the core types) and emits one `VkDescriptorPoolSize` for them. `createDescriptorSets()` reads the resolved `VulkanAccelerationStructure` handle from `ResourceRef.AS` (populated by `createResources()`) and writes it through a `VkWriteDescriptorSetAccelerationStructureKHR` chained on the descriptor write's `pNext`. The dispatch's pre-barrier dst access now includes `VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR` so the prior AS-build's writes are visible to the shader's RayQuery reads. Device creation enables `VK_KHR_ray_query` using the same chain-pre-query + error-on-flag-mismatch pattern that #1232 set up for the AS / BDA extensions — without `VK_KHR_ray_query` enabled the shader's `OpRayQueryProceedKHR` instructions silently no-op and `Output` reads back zero. `copyResourceDataToDevice()` short-circuits AS bundles via a new `ResourceBundle::isAccelerationStructure()` predicate (no host buffer to barrier). - **DX12**: writes a `D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE` SRV with the AS GPU virtual address as `Location` into the heap slot that `createBuffers()` reserved (`CreateShaderResourceView()` with a null resource — the AS data lives in the buffer pointed to by `Location`). - **Metal**: the Metal shader converter doesn't bind the AS directly; the shader reads a buffer containing an `IRRaytracingAccelerationStructureGPUHeader` that holds the AS's `gpuResourceID` plus a pointer to an instance-contributions array. `createBuffers()` allocates and fills both buffers per AS-descriptor entry, then points the descriptor at the header buffer's GPU address. The TLAS itself is built with the `UserID` instance-descriptor variant so HLSL `CommittedInstanceID()` returns the YAML-specified per-instance ID instead of the array index. The three InlineRT tests now actually exercise the AS end-to-end: `TraceRayInline()` issues a RayQuery against `Scene` and writes a hit-dependent value into `Output` (the instance ID for `multi-instance`, 1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang` remains. The test shaders also gain explicit `[[vk::binding]]` annotations because dxc's default HLSL→SPIR-V binding mapping collides `Scene`'s `t0` with `Output`'s `u0` at binding 0, which VVL flags as a descriptor type mismatch. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [ ] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
that referenced
this pull request
Jun 15, 2026
## Summary Stacks on top of #1232 and #1245 to add five small InlineRT tests, each isolating one `RayQuery` method on the existing single-triangle BLAS: - `miss-status.test` — `COMMITTED_NOTHING` path (ray points away from geometry) - `ray-t.test` — `CommittedRayT()` returns exact `1.0` for the axis-aligned hit - `barycentrics.test` — `CommittedTriangleBarycentrics()` at world `(0,0,0)` returns exactly `(0.25, 0.25)` - `world-ray-echo.test` — `WorldRayOrigin` / `WorldRayDirection` / `RayTMin` / `RayFlags` round-trip into a structured buffer; passes `-fvk-use-dx-layout` so SPIR-V matches DXIL's tight `float3` packing and the expected bytes are portable across DX / VK / MTL. - `tmin-tmax-clip.test` — two queries against the same BLAS: one with `TMin` past the hit, one with `TMax` before it; both must miss. First batch out of #1258 (inline-RT test coverage epic) — the easiest wins, no framework / YAML changes required. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [x] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
that referenced
this pull request
Jun 15, 2026
## Summary Stacks on top of #1232 / #1245. Adds the first InlineRT test with a non-trivial BLAS layout — three triangles tiled along x at `x = -4, 0, +4` — and a 3-lane dispatch that fires one ray per lane straight down at its own triangle. Each lane's `CommittedPrimitiveIndex()` must equal its lane index. Also exercises divergent rays per thread for free. Seed test for the multi-primitive / multi-geometry BLAS bullets in the inline-RT coverage epic (#1258). Independent of the other InlineRT test PRs (#1271, #1274) — only adds a new test file. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
that referenced
this pull request
Jun 15, 2026
## Summary Stacks on top of #1232 / #1245. Three TLAS instances at `x = -5, 0, +5` with `InstanceMask` values `0x01` / `0x02` / `0x04` and `InstanceID`s `0` / `1` / `2`. A 3-lane dispatch fires one ray per lane straight down at its own instance column, but every ray uses `InstanceInclusionMask = 0x02` — so only the middle instance survives the mask test. Lane 1 reports `InstanceID = 1`; lanes 0 and 2 miss. Covers the `InstanceInclusionMask` filtering bullet in the inline-RT coverage epic (#1258). Independent of the other InlineRT test PRs (#1271, #1272) — only adds a new test file. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
pushed a commit
that referenced
this pull request
Jun 15, 2026
## Summary Stacks on top of #1232 / #1245. Two rays at the existing single-triangle BLAS — one from +z (sees the front face per the default winding convention all three backends share) and one from -z (sees the back face) — with the `RAY_FLAG_CULL_BACK_FACING_TRIANGLES` template flag set. Lane 0 must hit and lane 1 must miss. Independent of the other InlineRT test PRs (#1271, #1272, #1274) — only adds a new test file. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [x] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso
added a commit
that referenced
this pull request
Jun 17, 2026
Depends on #1245 ## Summary Foundational PR in the PSO-based raytracing bring-up series tracked in #1268. Stacks on top of #1245 (which depends on #1244, which depends on #1232) — only the top commit on this branch is new; the rest are the inline-RT bring-up already in review. Lays out the framework-side surface needed by the upcoming backend PRs: - `ShaderPipelineKind::RayTracing` plus six new `Stages` — `RayGeneration`, `Miss`, `ClosestHit`, `AnyHit`, `Intersection`, `Callable` — with `isRayTracingStage` / `Pipeline::isRayTracing()` helpers. - YAML schema for an RT pipeline: `HitGroup` (Triangles | Procedural, ClosestHit + optional AnyHit / Intersection), `RayTracingPipelineConfig` (MaxTraceRecursionDepth, MaxPayloadSizeInBytes, MaxAttributeSizeInBytes, optional PipelineFlags), and `ShaderBindingTable` (raygen / miss / hit-group / callable records, each with optional reserved LocalRootData bytes). - `validatePipelineKind` allows duplicate RT stages (a pipeline can have several miss / hit-group shaders, which the existing duplicate check would have rejected), requires at least one RayGeneration, and rejects mixing with Compute/Vertex/Mesh. The reverse check rejects HitGroups / RTConfig / SBT on any non-RT pipeline. `validateDispatchParameters` reinterprets `DispatchGroupCount` as `{Width, Height, Depth}` for the upcoming DispatchRays and forbids VertexCount on RT. - Existing `Stages` switches across the backends grow the six RT cases — Vulkan maps each one to its `VK_SHADER_STAGE_*_KHR` bit ready for PR 2; Metal unreachables on RT (`metal_irconverter` takes a different route); raster pipeline `setShader` (Traditional + MeshShader variants) adds them to the existing unreachable group. - Each backend's `executeProgram` gets a terminal `else if (P.isRayTracing())` that returns a "not yet supported on <backend>" error so PR2/3/4 just have to replace it. - `%dxc_target_lib` lit substitution (same compiler binary, separate name for `-T lib_6_x` library targets); `raytracing-pipeline` available-feature gated on DX `RaytracingTier >= 1.0` and the Vulkan `VK_KHR_ray_tracing_pipeline` extension being reported by the device. - Foundational `test/Feature/RT/raygen-roundtrip.test` exercising the full schema (raygen+miss+CH, BLAS/TLAS, HitGroups, RTConfig, SBT). Gated on `raytracing-pipeline` and `XFAIL: *` until each backend bring-up lands. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: EmilioLaiso <emilio@traverseresearch.nl>
manon-traverse
pushed a commit
that referenced
this pull request
Jun 18, 2026
## Summary First per-backend bring-up in the PSO raytracing series (#1268). Stacks on top of #1270 (foundational schema + lit infrastructure + XFAILed test). Adds the API surface needed by the upcoming D3D12 and Metal PRs plus the Vulkan implementation behind it. API surface: - `ComputeEncoder::dispatchRays(PSO, SBT, W, H, D)` virtual on the existing compute encoder (no separate `RayTracingEncoder`). - `Device::createPipelineRT` + `Device::createShaderBindingTable` virtuals with a new `RayTracingPipelineCreateDesc` carrying the DXIL library blob, the shader entry points (Stage + EntryPoint), the hit-group list, and the `RayTracingPipelineConfig`. - `include/API/ShaderBindingTable.h` holding the abstract runtime base; backend SBT classes derive from it with LLVM-style `classof` / `cast<>`. - Rename: PR #1270's YAML struct `ShaderBindingTable` → `ShaderBindingTableDesc` so the bare name is free for the runtime class (parallel to `BLASDesc` / `TLASDesc` vs `AccelerationStructure`). YAML key stays `ShaderBindingTable:`. - D3D12 and Metal stub the new methods with not-yet-supported errors; their bring-up lands in follow-up PRs. Vulkan implementation: - The pre-existing `RaytracingFunctions RT` struct lumped AS and RT-pipeline entry points together; they split into `ASFunctions AS` + `RTPipelineFunctions RT` so the names match the actual feature-gate split (AS + ray-query is a complete configuration; RT pipeline layers on top). `HasRayTracingSupport` renames to `HasASSupport`; `HasRTPipelineSupport` tracks the new extension. - `VK_KHR_ray_tracing_pipeline` is requested when reported, with `VkPhysicalDeviceRayTracingPipelineFeaturesKHR` chained pre-query and the gating `rayTracingPipeline` bool checked post-query (matches the AS / BDA pattern from #1232). Sub-features the tests don't exercise (capture-replay / indirect-trace / traversal-primitive-culling) are cleared. - Function pointers `vkCreateRayTracingPipelinesKHR`, `vkGetRayTracingShaderGroupHandlesKHR`, `vkCmdTraceRaysKHR` resolve once at device creation. `VkPhysicalDeviceRayTracingPipelinePropertiesKHR` is cached at the same time for SBT handle size / alignment / base alignment. - `VKRayTracingPipelineState` derives from `VulkanPipelineState`; an `IsRayTracing` flag on the base lets the existing Vulkan `cast<>` path stay polymorphic without adding a new `GPUAPI` value. The derived class also carries a `StringMap<uint32_t>` resolving each shader `EntryPoint` or hit-group `Name` to its index in the pipeline's group array, plus per-bucket counts so the SBT builder can slice the contiguous handle blob into raygen / miss / hit / callable regions. - `createPipelineRT` builds a single `VkShaderModule` (the DXIL library compiles to one SPIR-V module with multiple `OpEntryPoint`s), one `VkPipelineShaderStageCreateInfo` per `Shader` entry, and one `VkRayTracingShaderGroupCreateInfoKHR` per general shader / hit group. Pipeline layout uses the same `createPipelineLayout` helper as the compute path, gated on all six RT stage flags so any binding can be consumed from any RT shader. - `createShaderBindingTable` allocates a host-visible coherent buffer big enough for four regions, then lays out each entry as `[handle bytes][LocalRootData bytes][padding-to-stride]`. Per-region stride = `align(handleSize + max-LocalRootData-in-region, handleAlignment)`; per-region size = `align(count * stride, baseAlignment)`. LocalRootData support comes for free from PR #1270's SBT schema; the test doesn't exercise it yet. Each region's `VkStridedDeviceAddressRegionKHR` derives from the buffer's `vkGetBufferDeviceAddress`. - `dispatchRays` binds the pipeline at `VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR`, emits a pre-barrier with `ACCELERATION_STRUCTURE_READ_BIT_KHR | SHADER_READ_BIT | SHADER_WRITE_BIT` dst access into `RAY_TRACING_SHADER_BIT_KHR`, then calls `vkCmdTraceRaysKHR` with the SBT's four region structs. - `createCommands` picks the new bind point for RT pipelines so `vkCmdBindDescriptorSets` binds to the right point. `executeProgram`'s `isRayTracing` branch builds a `RayTracingPipelineCreateDesc` from the `Pipeline`, calls `createPipelineRT` then `createShaderBindingTable`, and keeps both on `InvocationState` for the dispatch. Test side: `raygen-roundtrip.test`'s `XFAIL` becomes `Clang, DirectX, Metal`. On a DXC + Vulkan combo with the device reporting `VK_KHR_ray_tracing_pipeline` this should PASS; the Clang token still catches the compile failure on the Linux + `clang-dxc` loop where `[shader("raygeneration")]` doesn't yet lower to SPIR-V. ## Test plan Local on an NVIDIA RTX 3060: - [x] Linux Vulkan (native `offloader`) - [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`) - [ ] Windows Vulkan (native `offloader.exe`) - [ ] Windows D3D12 (native `offloader.exe`) CI (RT-capable runners): - [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`) - [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`) - [x] macOS Metal (`supportsRaytracing`) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: EmilioLaiso <emilio@traverseresearch.nl>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For #1158
Introduce the foundational types for ray tracing acceleration structures:
AccelerationStructurebase classRecording the actual build commands lands in a follow-up commit on top of the
ComputeEncoderabstraction.DX12 device interface bump
ID3D12DeviceXtypedef goes fromID3D12Device2toID3D12Device5, so the existingDevicemember directly exposesGetRaytracingAccelerationStructurePrebuildInfo(and the eventualCreateStateObject/SetPipelineState1for the PSO RT epic) — no separateDevice5member or post-createQueryInterfacedance.D3D12CreateDeviceis already invoked withIID_PPV_ARGS(&Device), so the bump naturally requires the adapter to support the Device5 interface (Win10 1809+); RT-capable hardware is selected by theacceleration-structurelit feature regardless.Vulkan device-creation refactor
Single
vkGetPhysicalDeviceFeatures2call: every extension feature struct we care about (atomic-int64,mesh-shader,acceleration-structure, BDA on 1.1) is chained intopNextbefore the query. Post-query we verify each extension's gating feature bool and clear the sub-features we don't need (capture-replay, indirect-build, multiview, etc.).Drive-by: rather than letting
vkCreateDevicereject the device with a genericVK_ERROR_FEATURE_NOT_PRESENT, the code returns a descriptivellvm::Errornaming the extension and the bool that came back zero — pinpointing the case where a driver advertises an extension but reports its base feature asVK_FALSE. The duplicatequeryDeviceExtensionscall is gone, andEnabledDeviceExtensionsis now one list (the previousEnabledExtensionsseparate vector silently dropped mesh-shader / atomic-int64 entries when RT was enabled).